Performance Evaluation of Tiling for the Register Level
نویسندگان
چکیده
Tiling is a well-known loop transformation, which is basically used to expose coarse-grain parallelism and to exploit data reuse at the cache level. However, it can also be used to exploit data reuse at the register level and to improve programs's ILP. Previous work on tiling and also commercial compilers are able to perform tiling for the register level in more than one dimension when the iteration space is rectangular. Non-rectangular iteration spaces are commonly found in linear algebra algorithms or can arise as a result of applying previous transformations such as loop skewing. In this paper we evaluate the technique we present in [11] which is able to perform tiling for the register level in more than one dimension in both rectangular and non-rectangular iteration spaces. We use typical linear algebra algorithms having non-rectangular iteration spaces as benchmarks and compare our proposal against commercial preprocessors able to perform optimizing code transformations such as inner unrolling, outer unrolling and software pipelining. We will also present quantitative data showing the benefits of tiling only for the register level, tiling only for the cache level and tiling for both levels simultaneously. Results measured on a ALPHA 21164 processor show that tiling for both cache and register levels improves upon commercial compilers and preprocessors by factors in the range of 1.3 to 6.3.
منابع مشابه
The Efficacy of an SFL-Oriented Register Instruction in Improving Iranian EFL Learners’ Writing Performance and Perception: Language Proficiency in Focus
The current study sought to explore the impact of SFL-oriented register instruction on Iranian EFL learner’ writing performance with a central focus on their English proficiency level. As its secondary aim, the study delved deeply into the learners’ perception of the register-based instruction. To these ends, 50 intermediate and 50 advanced Iranian EFL learners were selected randomly and assign...
متن کاملHierarchical tiling for improved superscalar performance
It takes more than a good algorithm to achieve high performance: inner-loop performance and data locality are also important. Tiling is a well-known method for parallelization and for improving data locality. However, tiling has the potential of being even more beneecial. At the nest granularity, it can be used to guide register allocation and instruction scheduling; at the coarsest level, it c...
متن کاملOptimized Dense Matrix Multiplication on a Many-Core Architecture
Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), belong to a new set of manycore-on-a-chip systems with a software managed memory hierarchy. New programming and compiling methodologies are required to fully exploit the potential of this new class of architectures. In this pape...
متن کاملPrimeTile: A Parametric Multi-Level Tiler for Imperfect Loop Nests
Tiling is a crucial loop transformation for generating high performance code on modern architectures. Efficient generation of multi-level tiled code is essential for maximizing data reuse in systems with deep memory hierarchies. Tiled loops with parametric tile sizes (not compile-time constants) facilitate runtime feedback and dynamic optimizations used in iterative compilation and automatic tu...
متن کاملThe Deleterious Nature of Interacting Tiling Optimizations
A compiler may perform multiple optimizations, each with its own goal and cost function. While it is acknowledged that optimizations can interact, in practice the interactions are often ignored, and assumed to have no deleterious eeects. In this paper, we demonstrate for optimizations involving tiling that the interactions have unexpectedly harmful eeects on overall performance. Current trends ...
متن کامل